NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Contrastive Learning with Data Misalignment: Feature Purity, Training Dynamics and Theoretical Generalization Guarantees

Sun, Jiawei; Zhang, Shuai; Li, Hongkang; Wang, Meng (December 2025, The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS))

Full Text Available
Contrastive Learning with Data Misalignment: Feature Purity, Training Dynamics and Theoretical Generalization Guarantees

Sun, Jiawei; Zhang, Shuai; Li, Hongkang; Wang, Meng (September 2025, The Thirty-Ninth Annual Conference on Neural Information Processing Systems)

Contrastive learning is a powerful framework for learning discriminative representations from image-text pairs. Despite its success, its theoretical foundations, especially when the image-text pair exhibits misalignment, remain underexplored. This paper provides the first theoretical analysis of contrastive learning under data misalignment, proving how the ground-truth modality-paired features are amplified while spurious features are suppressed through the training dynamics analysis. Specifically, we study two nonlinear encoders trained jointly with a contrastive loss and demonstrate that noisy (or misaligned) data pairs result in mixed representations and degrade the model's generalization ability. In contrast, recaptioning and filtering improve the data alignment, which in turn purifies the features learned by neurons and subsequently enhances generalization. Our analysis identifies feature purity as a key factor in the success of contrastive learning and offers insights into how data quality and training procedures impact representation learning and downstream generalization. Theoretical insights are supported by experiments on standard benchmarks.
more » « less
Full Text Available
Theoretical Learning Performance of Graph Networks: the Impact of Jumping Connections and Layer-wise Sparsification

Sun, Jiawei; Li, Hongkang; Wang, Meng (June 2025, Transactions on Machine Learning Research)

Full Text Available
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

Li, Hongkang; Lu, Songtao; Chen, Pin-Yu; Cui, Xiaodong; Wang, Meng (April 2025, The Thirteenth International Conference on Learning Representations (ICLR))

Full Text Available
When is task vector provably effective for model editing? a generalization analysis of nonlinear transformers

Li, Hongkang; Zhang, Yihua; Zhang, Shuai; Wang, Meng; Liu, Sijia; Chen, Pin-Yu (May 2025, 2025 International Conference on Learning Representations (ICLR))

Task arithmetic refers to editing the pre-trained model by adding a weighted sum of task vectors, each of which is the weight update from the pre-trained model to fine-tuned models for certain tasks. This approach recently gained attention as a computationally efficient inference method for model editing, e.g., multi-task learning, forgetting, and out-of-domain generalization capabilities. However, the theoretical understanding of why task vectors can execute various conceptual operations remains limited, due to the highly non-convexity of training Transformer-based models. To the best of our knowledge, this paper provides the first theoretical characterization of the generalization guarantees of task vector methods on nonlinear Transformers. We consider a conceptual learning setting, where each task is a binary classification problem based on a discriminative pattern. We theoretically prove the effectiveness of task addition in simultaneously learning a set of irrelevant or aligned tasks, as well as the success of task negation in unlearning one task from irrelevant or contradictory tasks. Moreover, we prove the proper selection of linear coefficients for task arithmetic to achieve guaranteed generalization to out-of-domain tasks. All of our theoretical results hold for both dense-weight parameters and their low-rank approximations. Although established in a conceptual setting, our theoretical findings were validated on a practical machine unlearning task using the large language model Phi-1.5 (1.3B).
more » « less
Full Text Available
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers

Li, Hongkang; Zhang, Yihua; Zhang, Shuai; Chen, Pin-Yu; Liu, Sijia; Wang, Meng (April 2025, The Thirteenth International Conference on Learning Representations (ICLR))

Full Text Available
A Theoretical Understanding of Vision Transformers: Learning, Generalization, and Sample Complexity

Li Hongkang Li; Wang, Meng Wang; Liu, Sijia; Chen, Pin-Yu. (May 2023, Proc. the Eleventh International Conference on Learning Representations (ICLR))

Full Text Available
Learning and generalization of one-hidden-layer neural networks, going beyond standard Gaussian data

https://doi.org/10.1109/CISS53076.2022.9751184

Li, Hongkang; Zhang, Shuai; Wang, Meng Wang (January 2022, Proc. 2022 56th Annual Conference on Information Sciences and Systems (CISS))

Full Text Available
Generalization Guarantee of Training Graph Convolutional Networks with Graph Topology Sampling

Li, Hongkang Li; Wang, Meng; Liu, Sijia; Chen, Pin-Yu; Xiong, Jinjun Xiong. (January 2022, 2022 International Conference on Machine Learning (ICML))

Full Text Available

Search for: All records